Thirty Years of Numerical Taxonomy

نویسندگان

  • P. H. A. Sneath
  • P. H. A. SNEATH
چکیده

-In this history of numerical taxonomy since the publication in 1963of Sokal and Sneath's Principles of Numerical Taxonomy, I include reminiscences of the reactions of biologists in Britain and elsewhere. Much of the original program has proved sound. The debate on phenetic and phylogenetic classifications has been valuable, although the logical and theoretical aspects of phenetics have been greatly overlooked in the debate. Hennigian cladistics, however, is a side issue that has not proven its value. Numerical taxonomy in the broad sense is the greatest advance in systematics since Darwin or perhaps since Linnaeus. It has stimulated several new areas of growth, including numerical phylogenetics, molecular taxonomy, morphometrics, and numerical identification. It has wide application outside systematic biology. Landmarks and trends are impoytant aspects of numerical taxonomy. In microbiology, the program of numerical taxonomy has been successful, as indicated by the preponderance of papers describing numerical relationships in the International Journal of Systematic Bacteriology This review concludes with comments on the needs and prospects of the future. [Numerical taxonomy; phenetics; phylogenetics; cladistics; molecular taxonomy; numerical identification; morphometrics; bacterial systematics.] This contribution is a highly personal the term "numerical taxonomist" between account of the history of numerical taxonthe original broad sense of those who use omy since 1963 and reflects my own emany quantitative computer methods and phasis on microbiology, where the meththe narrow sense of a discernable group of odology has been most useful. I have, systematists who practice numerical phehowever, tried to do justice to major connetics. In the present review, the first, tributors to wider aspects and have includbroad sense is intended in most instances, ed some discussion of the growth of nubut I have tried to clarify this where conmerical taxonomy in various new areas. fusion mav arise. Principles of Numerical Taxonomy, pubThe de;elopment of numerical taxonolished in 1963 (Sokal and Sneath, 1963), my (in the broad sense) has not followed was obviously a preliminary exposition of very closely the early program (Jensen, a new field, but numerical taxonomy was 1993), except perhaps in microbiology. Yet clearly defined to include the drawing of today all systematics is to some extent nuphylogenetic inferences from the data to merical. Computers and numerical taxothe extent that this was possible (Sokal and nomic programs are now standard reSneath, 1963:48). Numerical taxonomy has sources in every museum and systematics therefore a broader scope than phenetics, laboratory. Workers in all branches of syjust as phylogenetics has a broader scope tematics and comparative biology need to than Hennigian cladistics. There are frebe familiar with phenetic and cladistic quent misconceptions on this score. I was methods, and these methods are critical surprised to be asked by an editor to add for many problems in classification (Jensen, to the title of a review of numerical anal1993). ysis of molecular sequences in bacterial systematics (Sneath, 1989) the phrase "the view of a numerical taxonomistu-one The idea of quantifying relationships would think it was self-evident; what else goes back to the last century. No consistent could it be? Perhaps I should have insisted system, however, had been developed for that this phrase read "the view of one nucharacter choice, coding, and weighting, merical taxonomist on the findings of other for evaluating resemblance, for grouping numerical taxonomists." There may thereorganisms and constructing a classificafore be some ambiguity in the meaning of tion, or for setting up sound identification systems. The 1963 book provided the outline of these procedures, together with explicit principles of phenetics and a synopsis of possible numerical approaches to phylogeny. It emphasized the distinction between phenetic and cladistic relationships, first clearly stated by Cain and Harrison (1960). The availability of computers made the program practicable, as noted by Robert Sokal in a contribution (Sokal, 1985b) to a symposium of the Society for General Microbiology in 1983. Robert Sokal and I met in Lawrence, Kansas, in 1959. Sir Christopher Andrewes, the distinguished virologist and amateur entomologist, was working at the National Institute for Medical Research in London when I was there, and one day in the summer of 1958 he looked into my laboratory to say that there was another person who was mad enough to think that taxonomy could be mathematical and gave me an abstract of a paper that Sokal had given to the recent Entomology Congress (Sokal, 1958). This was a time when I was attempting to work out a logical way to classify bacteria, and in 1959, when on a Rockefeller Fellowship in the USA, there was an opportunity to visit Robert Sokal in Lawenee-(Hull, 1988:122). We soon set up a close collaboration, which flourished when Sokal visited University College, London, for a year. Other early contributions came from Arthur Cain and Geoffrey Harrison (Cain, 1958, 1959; Cain and Harrison, 1958) and from David Rogers and Taffee Tanimoto (Rogers and Tanimoto, 1960), of whose team George Estabrook is a distinguished student. Also, John Gilmour (1937, 1940, 1951) provided some key concepts of information content and predictivity, adapted from the Victorian philosophers of science John Stuart Mill and William Whewell (Mill, 1886; Whewell, 1840). George Gaylord Simpson was also a key figure, not so much because of his contributions to methodology but because of his exceptionally clear thought about the several areas within systematics and the problems confronting them (Simpson, 1944,1961). Simpson's book Tempo and 'IC BIOLOGY VOL. 44 Mode in Evolution (1944) had a major influence on both Sokal and myself. AIMSAND ASSUMPTIONS OF NUMERICAL TAXONOMY Our 1963 book had to start from fundamentals-homology, character weighting, overall similarity, information content of groups-without these painstaking considerations the later developments would have had no basis (at the time, these were difficult problems; see Sneath and Sokal, 1973:418). One topic, homology, remains almost as intractable as ever from the formal standpoint, although I believe the solution must lie along lines we mapped out (Sokal and Sneath, 1963:69-74), following Woodger (1945), and further elaborated by Jardine (1967, 1969). In his historical review, Vernon (1988) said that the major factor in the new thinking was a healthy scepticism of taxonomic dogma. He likened it to the Renaissance questioning of medieval dogma. Numerical taxonomy in the broad sense (i.e., including both phenetic and phylogenetic approaches) has been the greatest advance in systematics since Darwin or (because Darwin had relatively little effect on taxonomic practice [Hull, 1988:101]) since Linnaeus. Much was swept away. The method of division from above, the hierarchy of characters, and the primacy of differential characters or of functional characters are little heard of today (Sneath, 1991). How did distinguished scientists view the program? Hull (1988) summarized reactions in North America. I discussed the early broad concepts with a number of scientists in Britain and elsewhere, and my recollections (perhaps not entirely accurate after so many years) are as follows. The population geneticists were usually open minded. J. B. S. Haldane thought it was a good idea and should be tried out. Sewall Wright said much the same. Ronald Fisher agreed but was concerned (as was characteristic of him) that there was no exact statistical basis. Evolutionary biologists gave it a more mixed reception. Julian Huxley felt that characters of evolutionary importance must in some way be given great 1995 THIRTY YEARS OF N1JMERICAL TAXONOMY 283 weight, although how this can be achieved before one has first made a classification or phylogeny is no clearer today. He seemed to be concerned that numerical taxonomy was in some way against evolutionary theory. Certainly many systematists (with the exception of microbiologists, who had few preconceptions about bacterial evolution) gave me the impression that because our ~ r o ~ o s a l s did not I I presuppose phylogenetic judgements, they must be antievolutionary. They seldom understood that these proposals could lead to techniques by which one could actively explore phylogeny. George Gaylord Simpson, whom I had the privilege of visiting (guarded by Anne Roe, who was afraid I wanted to pick a quarrel), was surprisingly receptive. As long as there was a place for phylogenetics, he had no objection to the new concepts as a first step. He thought we were too optimistic in thinking that phenetics would vield results of value in initiallv determining the outlines of phylogeny, 'but he did not deny either the possibility or the logic. Simpson had a penetrating mind and clearly appreciated the distinction between phenetic and cladistic relationships. He had a keen awareness of the problems posed by clades and grades, by rapid radiation, and by the slow evolution of "living fossils:' and he pioneered the estimation of evolution rates (Simpson, 1944, 1961). The botanist John Gilmour, although intellectually supportive, was surprisingly dismissive of numerical taxonomy on the practical side. He felt the employment of computers was using a sledge hammer to crack a nut; he never appreciated the value of quantitation of relationships in general nor the intractable ~roblems of bacteriology in particular. Yet he was active in the early years of the British Classification Society, of which he was the founder. This was a time when numerical taxonomic concepts were being explored in fields outside biology, and he viewed these develovments with excitement. The most distinguished bacterial taxonomists of the time had varied views. At that time there were insufficient data for phylogenetic endeavors, so the debate concerned phenetic concepts. Cornelius Bernhardt van Niel was greatly interested in biochemistry (van Niel, 1946). He thought that numerical taxonomies might not reflect the wonderful variety of metabolic pathways in bacteria and the great importance of these pathways. Samuel Cowan, a medical diagnostic bacteriologist of radical and innovative opinions (Cowan, 1970), was very supportive provided the end result was better bacterial identification. Robert Earle Buchanan was principally interested in bacterial nomenclature, but he approved of the program because it might bring some order into a chaotic field and encouraged workers such as Beers and Lockhart (1962) to explore its new techniques. But almost all microbiologists (who had practical problems to solve and few prejudices) gave the program a welcome. If there is a general conclusion to be drawn from the reactions of biologists, it seems to be that they all wished to defend their own special interests. The more perceptive among them could see that numerical taxonomy was seldom a threat and often an asset. The early aims and assumptions of numerical taxonomy were listed by Sokal and Sneath (1963:49-50, 84-91, 111-115): (1) the aims of repeatability and objectivity, (2) the use of quantitative measures of resemblance from numerous equally weighted characters, (3) the construction of taxa from character correlations leading to groups of high information content, and (4) the separation of phenetic and phylogenetic considerations. The subject was viewed as an empirical science. Four major assumptions underlay these aims: the nexus, nonspecificity, factor asymptote, and matches asymptote hypotheses. How have the early aims and assumptions fared? Some aspects have stood up well. The emphasis on reliability, which requires the use of numerous characters and implies a statistical outlook, has paid off. Workers on identification methods are well aware of this, and phylogeneticists are beginning to appreciate it. Character weighting is not greatly disputed now. Scaling of 284 SYSTEMAT'IC BIOLOGY VOL. 44 characters so that each item of relevant information carries unit weight is widely accepted. The power of overall similarity measures to construct taxonomic groups, to determine evolutionary relationships, and for identification has been amply borne out, even if somewhat different forms of similarity may be needed for different purposes. The complexity of phenetic and genomic relationships was not foreseen and weakens support for the factor asymptote hypothesis, which assumes that most of the variation will be recovered by only a moderate number of characters. There has been little work specifically on this hypothesis. Nevertheless, practical experience suggests that it holds fairly well. The number of characters needed for reliable results was perhaps underestimated by the matches asymptote hypothesis, which postulates that as the number of characters evaluated increases, the similarity between two organisms will settle near a parametric value. There is always some residual discordance even with large numbers of characters; this discordance is not yet well understood. There have always been reservations on whether there are parametric measures of resemblance (although the other assumption, that character sets are sufficiently random, has not been a major problem). Nevertheless, the hypothesis has proven useful, and it has forced attention on the need to explain the exceptions that arise, notably when studying incongruence between character sets. The nexus hypothesis, that most phenotypic characters are affected by many genes and most genes affect many characters, has held fairly well. The remaining hypothesis, the nonspecificity hypothesis, is that there are no large classes of genes that determine exclusively one class of characters. This hypothesis has not held up so well (Rohlf, 1965). The congruence between similarities or classifications from different subsets of characters is consistently less than that from randomly chosen subsets; many examples were tabulated by Sneath and Sokal (1973:lOO-102). Yet this is precisely what one would expect from biological data. The incongruence must consist of two nonzero parts: (1)that due to random sampling error (because the characters do not form an infinite population) and (2) that due to other factors, mainly biological no doubt but with contributions from measurement error and the like. The second part is commonly small, but if not, it may indicate important factors such as phenotypic plasticity, expression of different parts of the genome, or different selection pressures on different stages of the life cycle. This area is ripe for further study. One of the contributions of Hennig (1950) was to draw attention to its implications for phylogeny. He pointed out that larval and adult insects had the same phylogeny; therefore phylogenetic methods must not be misled by such incongruences. Such hypotheses are not just true or false: the question is to what extent they hold. There is a parallel in the "nature versus nurture" controversy in the social sciences. One hypothesis is that individual achievement is due to genetic factors; another is that it is due to environmental factors. A sophomore who believed that one hypothesis must be true and the other false would have difficulty in getting into a college psychology program. Such hypotheses also have the function of null hypotheses in statistics, to set a model against which observations can be judged. The understandable desire of biologists to obtain the "right" answers in taxonomy has also obscured the fact that we must state our aims in detail and that we can only obtain the "best" answers from the available information (equally true for phenetic and phylogenetic approaches). The concepts of repeatability and objectivity are sometimes criticized, although no one advocates their opposites. When repeatability and objectivity levels have actually been tested, they have usually been acceptable, despite the philosophical criticisms. Such tests have occurred most often in microbiology (see several contributions collected by Goodfellow et al., 1985). We noted (Sneath and Sokal, 1973:431) that clear and apparently acceptable numerical 1995 THIRTY YEARS OF NIJMERICAL TAXONOMY 285 results have seldom been contradicted by later work. This is still true. The objection that numerical taxonomic results have seldom contradicted earlier traditional views is not correct for situations where the earlier knowledge was poor (e.g., birds, bacteria). In any event, quantitation is a desirable feature in science. One problem has, I believe, been effectively dealt with. The question was whether phenetic clustering yields groups that have maximum information content and are maximally predictive (Farris, 1977). It has now been shown that this can be true. Variance clustering of the simple matching coefficient maximizes the square of predictivity (Sneath and Hansell, 1985). Even this problem, however, is not simple; Gower (1974) showed that one can maximize separately the within-cluster predictivity and the between-cluster predictivity, and it is not always clear which of the two is the most useful in taxonomy. THE CONTROVERSY AND ON PHENETIC CLADISTIC APPROACHES The debate on whether biological classifications must be phenetic or cladistic has died down somewhat as it is increasingly realized that both have their advantages and both present problems, at the theoretical and practical levels. They have different goals: phenetics aims to give information-rich groups, and phylogenetics aims to reconstruct evolutionary history. Both are important in systematics. Phenetic groupings can be verified by phenetic criteria but cannot be proven to correspond to reality, whereas phylogenetic groupings must correspond to reality, but cannot be verified-or so it is sometimes said. But consider the classification of chemical elements in the periodic table by Mendeleev and others in the 19th centurv. Groups such as halogens or noble (these can be constructed numerically; Sneath, 1988, 1991) are phenetic (certainly not phylogenetic); are they real or not? This example illustrates three important points. First, information-rich groups do not necessarilv have a historical basis. Sec-I ond, to reconstruct history one must have models of how history operates. Third, all aspects involve theory and prior assumptions. These groups of elements involve at least four types of theory: (1)theory of homology (i.e., what should be compared with what), (2) theory of information-rich groupings (which implies high predictivity [Mill, 1886; Gilmour, 1940]), (3) Mill's (1886) theory of general causes (i.e., phenetic groups are due to as yet undiscovered causes-in the 1920s, discovered to be the electron shells of atoms [van Spronsen, 1969]), and (4) theory of history (i.e., theory of atomic transformations [Viola and Matthews, 19871). The first three are theories of phenetics; the last is the theory of history. It is clear, therefore, that it is perverse to imply phenetics is theory free or that phylogeny requires no models of evolution (points that have regrettably been misunderstood by philosophers of science [Hull, 1988; Sober, 1988; Scott-Ram, 19901). In preparing this review, I came across notes of a seminar that I gave at the London School of Hygiene and Tropical Medicine on 5 November 1956, which contain the headings "Classification of microorganisms: a problem in logic and mathematics" and "Content of information equals predictive value;' as well as similarity matrices and a tree. Such matters were clearly central to rethinking the bases of systematics. We did not, I think, emphasize sufficiently Mill's theory of general causes in the early days, which would have linked numerical taxonomy more closely to Darwin's great work. One of Darwin's strongest arguments was that the observed nested hierarchy of living organisms (phenetic, even though it was an intuitive rather than numerical phenetic system) required an explanation and had a general cause. This cause was descent with modification, or, in a word, evolution. In chapter 14 of the Origin of Species, Darwin (1886:364) said, No doubt organic beings, like all other objects, can be classed in many ways, either artificially by single characters or more naturally by a number of characters. We know, for instance, that minerals and the elemental substances can be thus arranged. In this case there is of course no relation to genealog286 SYSTEMATIC:BIOLOGY VOL. 44 ical succession, and no cause can at present be assigned for their falling into groups. But with organic beings the case is different, and the view above given accords with their natural arrangement in group under group; and no other explanation has ever been attempted. According to Peckham (1959:648), most of this passage was added in the fourth edition, published in 1866. It was added to strengthen the sentence (slightly modified) in the first edition (1859), "The grand fact of the natural subordination of organic beings in groups under groups which, from its familiarity, does not always sufficiently strike us, is in my judgement thus explained.'' Darwin was perhaps thinking of chemical elements when referring to elemental substances, because the periodic system developed by Newlands, Meyer, and Mendeleev was then viewed as a striking and exciting achievement (van Spronsen, 1969). The reference to natural arrangement may well refer to Mill, who probably introduced the term "natural kind" (Hull, 1988:78, footnote). It is common to refer to three taxonomic philosophies: phenetic, phylogenetic (cladistic), and evolutionary. All of them can of course be numerical. The third, evolutionary systematics, has not attracted much discussion in recent years. Its concepts were well described by Mayr (1969) and have been discussed by Sneath and Sokal (1973:421423) and Hull (1988:107109,520). One reason for the lack of debate may be that its concepts are not easily explained in methodological terms: characters should be weighted according to their evolutionary importance, yet rules for this are lacking; phylogeny should be considered as an important basis, but monophyletic groups are not insisted upon; criteria for defining biological species may be philosophically sound yet are difficult to apply. These considerations were a major reason that our general understanding of evolutionary processes was not incorporated into the early endeavors of numerical taxonomy. Evolutionary systematics could not easily advance before the availability of molecular data because so little numerical information was then available, and it still has not made noticeable progress. Nevertheless, some of its problems still need answering. HENNIGIAN CLADISTICS The rise of interest in Hennigian cladistics (Hennig, 1950, 1966) was to me very surprising. It seemed obvious that if one could be certain of evolutionary homologies and ancestral and descendant character states then the reconstruction of phylogeny would be straightforward. But I found it hard to believe that these homologies and states could be determined in the naive fashion that was proposed. Admittedly, phenetics had problems in exact definition of homology in the general and nonevolutionary sense, i.e., to determine what should be compared with what (length with length, breadth with breadth, etc.). This question cannot be evaded. Thus, in the context of classifying chemical elements, to determine the relationship of gold and silver one must decide whether the melting point of silver fluoride should be compared with the melting point of gold fluoride and not with the boiling point of gold fluoride or the melting point of gold chloride. Consistent theory here is a problem for all scientific comparisons, not only phenetics. But to assume that one could solve evolutionary homology without even considering general homology and, furthermore, to determine ancestral and descendant characters states a priori appeared to me impossible, indeed quixotic-a veritable Mambrino's helmet (admirers of Cervantes will remember that what to Don Quixote was the golden helmet of the hero Mambrino was to everyone else the barber's brass basin, which he had put over his new hat to keep off the rain; Don Quixote, I, ch. xxi). It seems that the immediate appeal of cladistics was its apparent simplicity. Hull (1988:519, 520) said, "Numerical taxonomists offered a plethora of techniques, each with its own strengths, each with its own weaknesses. . . . The cladists presented systematists with a method-one methodand they could use it without becoming experts." The method referred to is Hen1995 THIRTY YEARS OF NUMERICAL TAXONOMY 287 nig's method of synapomorphies (Hennig, 1966). An added attraction, no doubt, was that it was not numerical (or at least the earlier algorithms for finding approximate minimum-length trees were simpler than some phenetic methods). There was no need to use numerous characters, and one could reconstruct exact phylogenies instead of having to estimate approximate ones. The argument of Hennig was that shared descendant character states, synapomorphies, will yield a cladogram with certainty, provided these states can be found. The difficulty is to find them. One cannot reconstruct phylogeny from synapomorphies if one must first know the phylogeny to recognize correctly the synapomorphies. This is logically the same as trying to select the discriminatory characters before knowing which groups are to be discriminated. One must not assume the answer in advance; to do this is not science. Phenetics mav have to assume the general homologies in advance, but it does not assume the groups and then choose or alter the data so as to fit these groups. There was a strong temptation in early Hennigian studies to justify the techniques with this kind of circular argument. Hull (1984) perceptively noted that the problem of paraphyletic groups is not easily disposed of by Hennigian cladistics. Platnick (1979) asserted that the reason taxa seem polythetic is that the traits used are being misidentified as synapomorphies. This is an echo of an earlier argument by Remarie (1956; see Sokal and Sneath, 1963:99; Sneath and Sokal, 1973:49). Yet Hull noted that this approach is close to censoring unpalatable findings. And to claim that groups cannot be polythetic is hard to credit for molecular data where amino acids and nucleotides can scarcely be misidentified. These problems soon became obvious (Colless, 1969). Yet if one introduces qualifications that synapomorphies are only provisional hypotheses, recodes data to remove homoplasy, or appeals to parsimony and the like, the technique becomes iterative and certainly no longer simple. Many of these points were well discussed by Jensen (1983). The concepts of Hennig were greatly expanded by his followers (Hull, 1984, 1988:244-251). But the major expositions (Eldredge and Cracraft, 1980; Nelson and Platnick, 1981; Wiley, 1981) do not in my view cover the problems adequately. The simple three-taxon method of Nelson and Platnick, the "rule of D," is not workable (Sneath, 1982). Recent attempts to revive the three-taxon method were heavily criticized by Harvey (1992). Noone answered my challenge to produce a simple Hennigian technique .for data where relationships between organisms were not known or implied (Sneath, 1982:211 [first table]). If there were a simple technique there would be no arguments about phylogenies or synapomorphies, whereas such disputes continue as before. There are virtually no Hennigian analyses in microbiology or molecular taxonomy, where there are few preconceptions of the relationships of the organisms. The one attempt by Hennigian methods to determine a problem of major significance in microbiology, the relationships of bluegreen algae to bacteria and eukaryotes by Humphries and Richardson (1980), was merely a reiteration of traditional views based on a few allegedly important photosynthetic pigments (Sneath, 1988, 1989). By this analysis blue-green algae were most closely related to green plants, not bacteria. It gave no hint of the dramatic findings from molecular data by Woese (1981, 1987), which indicated that bluegreen algae, together with chloroplasts and mitochondria, belong to the eubacterial clade. Rogstad (1991), in reviewing a collection of essays in molecular evolution (Selander et al., 1991), bewailed the absence of any cladograms (synapomorphograms) in the book. Hennigian cladistics will I think fade, at least in its simplistic form, although it is not clear what will happen to pattern cladism, which I for one do not understand (others also find it obscure [e.g., Hull, 1984; Scott-Ram, 19901). I find it hard to envisage its useful products. One hopes that "cladism" will not lead to a generation of mistaught systematists. The Hennigian de288 SYSTEMATI 'C BIOLOGY VOL. 44 bate, nevertheless, has led to some profitable insights. It has led to a clearer understanding of the difficulty of inferring common ancestors from character distributions alone and of translating phylogenies into hierarchic classifications. It has renewed interest in nested hierarchies of character states. The sister-group and outgroup concepts are useful ones. Synapomorphograms can be effective summaries of salient evolutionary stages in palaeontology. Further, if it were possible to discover classes of characters that seldom show incongruence between different stages of the life cycle, it would be an important advance toward finding reliable synapomorphies. Four areas of numerical taxonomy that were only faintly foreshadowed in 1963 have developed greatly: numerical phylogenetics, molecular taxonomy, numerical identification, and morphometrics. Numerical Phylogenetics A great deal of work has been done on numerical methods for reconstructing phylogenies. The earliest work was by Edwards and Cavalli-Sforza (1964) and Camin and Sokal (1965). It would be a large task to review later developments, so I have only chosen what seem to me the main conceptual steps. First, it was realized that one could not rely upon single characters to determine the branching pattern, however significant they appeared to be. Therefore, numerical similarities of some kind (or dissimilarities, evolutionary distances) based on many characters, were required (Edwards and Cavalli-Sforza, 1964; Kidd and CavalliSforza, 1971). Alternatively, one needed evolutionary compatibility among many characters as developed first by Le Quesne (1969) and extended by others (Estabrook, 1972; Estabrook and Landrum, 1975; Sneath et al., 1975; Estabrook et al., 1976; Estabrook and McMorris, 1977) to multistate and undirected characters. Second, it was found necessary to introduce onto trees internal nodes that represent putative ancestors (Edwards and Cavalli-Sforza, 1964; Camin and Sokal, 1965; Dayhoff et al., 1965; Fitch and Margoliash, 1967). This realization led to concepts of parsimonious evolution, minimum-length trees, and the mathematics of Steiner and Wagner trees, to which Farris (1970, 1971, 1972; Farris et al., 1970) made notable contributions. Similar work occurred in character compatibility (clique) analysis (Le Quesne, 1972; Estabrook et al., 1976; Estabrook and Meacham, 1979; Meacham, 1981). The study of internal nodes also led to the understanding (Day, 1983) that most numerical methods for phylogeny are NP complete, that is, there is no way of being certain one has discovered the optimal tree except by examining every one of the possible tree topologies, and even then the optimal tree may not be the true tree. Because the number of such topologies is extremely large even for quite modest numbers of organisms, the optimal solution is often impracticable, despite notable advances in computing. This raises the question: should one accept a suboptimal solution when one does not know how good it is? Third, it was gradually realized, and most clearly explained by Felsenstein (1982, 1983b), that the reconstruction of phylogeny is a statistical problem and requires assumptions of how evolution occurs. This realization allowed the development of maximum-likelihood methods (Elsenstein, 1973). It also led to the important observation (Elsenstein, 1981) that when rates of evolution are steady over characters, minimum-length methods will perform better than character compatibility methods, whereas when rates differ greatly among characters, compatibility methods will be better than minimumlength methods. This is a formal equivalent of the belief that character compatibility analysis reduces the effect of characters that behave erratically during evolution (Le Quesne, 1982). The question of suboptimal solutions can now be addressed, because it becomes clear that all phylogenies are to some extent uncertain but that methods can be found for estimating this uncertainty. The 1995 THIRTY YEARS OF NUMERICAL TAXONOMY 289 implication is that it is not worth seeking for a solution that is algorithmically optima1 if this solution is too uncertain because of the inherent statistical properties of the data. We still need better guidelines here. Molecular Systernatics With commendable foresight, Francis Crick (1958) predicted the emergence of molecular taxonomy. But it was not until the early 1960s that the first clear evidence showed that relationships determined from molecular sequences were congruent with traditional systematics (Margoliash, 1963; Doolittle and Blomback, 1964; Margoliash and Smith, 1965; Zuckerkandl and Pauling, 1965a, 196513). The paper by Fitch and Margoliash (1967) on cytochrome c sequences from fungi, arthropods, and vertebrates for the first time showed a wide audience what could be done with molecular data. We should particularly remember the pioneering work of Margaret Dayhoff, who first produced a coherent presentation of both data and methods for protein sequences (Dayhoff et al., 1965; Dayhoff and Eck, 1968). Hori and Osawa (Hori, 1975; Hori and Osawa, 1986) will be remembered for exploiting nucleic acid sequences and for the first conspectus of almost the whole range of living organisms based on 5s ribosomal sequences. The numerical taxonomic contribution in the broad sense of Woese and his colleagues, which led to the discovery of a second prokaryote clade, the archaebacteria, is also most important (Woese, 1981,1987). Kirsch (1969) and Moore et al. (1973) gave important insights into the ultrametric properties of molecular data. Most of the molecular studies have been directed to phylogenetics, but some, such as studies of DNA-DNA pairing, are phenetic in orientation. Genornic data can give phenetic results because phenetic relationships are not necessarily phenotypic; phenetic relationships strictly estimate overall resemblance, not time; cladistic relationships strictly estimate time to common ancestors, not resemblance (or surrogates for time and resemblance). Numerical Identification Numerical methods for identification have become extremely powerful. meir roots in discriminant functions and Pearsonfscoefficientof racial likeness have thrown uv several simpler methods based on taxonomic distances. These distances are measured between an unknown specimen and various taxon centers, and the closest taxon to the unknown is taken to be the most likely identification. This process is polythetic and phenetic in philosophy. Most of this work has been in microbiology, where the first suggestions on how to apply numerical taxonomy to identification were made by Beers and Lockhart (1962). Further advances, including probabilistic features, have been made by Gyllenberg (1964, 1965), Dybowski and Franklin (1968), and Lapage and his colleagues (Lapage et al., 1970). Complementary methods have been developed in botany by workers such as Pankhurst (1970, 1975, 1978), Morse (1974), Duncan and Meacham (1986), and Wilson and Partridge (1986). Australian botanists are now producing extensive databases combined with software for interactive numerical identification (Hyland and Whiffin, 1994; Watson and Dallwitz, 1994). These methods are now being applied also in zoology (Fortuner and Wong, 1984; Fortuner, 1993). Numerical identification is widely used in microbiology, where it is generally based on matrices that contain the percentage of positive test results for the various species. Computer programs have been developed by our team at Leicester to evaluate the quality of such databases (Sackin, 1987; Priest and Williams, 1993). These databases, with computer software, are now incorporated into automated instruments that identify microbes by the use of manufactured test kits. Such instruments are now being shown at medical laboratory trade fairs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NUMERICAL TAXONOMY AND SEED PROTEIN ANALYSIS OF HYOSCYAMUS SPECIES IN IRAN

Numerical taxonomy and seed storage protein analysis of Hyoscyamus species of Iran was carried out with the aim to illustrate species inter-relationship and to check the sub-generic taxonomic treatment proposed for the genus. Cluster analysis of morphological and protein data grouped the species in three separate clusters which supports the relationships of H. niger with H. reticulatus and H. k...

متن کامل

Application of Numerical Taxonomy Analysis in Sustainable Development Planning of Combating Desertification

Due to increasing importance of desertification challenges and its consequences the necessity of sustainabledevelopment achievement in arid and semi-arid regions is essential in order to avoid limited sources asting, increase the efficiency of controlling, reclamation and restoration projects of natural areas. Based on the iterature review, it has been recognized that combating desertification ...

متن کامل

Numerical taxonomy of the genus Glycyrrhiza in the Hyrcanian region

Noticeable variation was observed in a number of morphological characters in the Glycyrrhiza specimens in various habitats in north of Iran. To study these variations, two taxa related to the Hyrcanian region from all four known taxa of this genus in Iran, including Glycyrrhiza echinata, and G. glabra var. glabra were studied by numerical taxonomy using morphological characters in three provinc...

متن کامل

Association Levels of Development the City of Kerman In Terms of Health and Indicators Using the Numerical Taxonomy

Abstract Introduction: knowledge and understanding of the situation is necessary for planning to achieve optimal development that this identifying involves separating study regions to planning areas and evaluation of the separate regions by development indicators and analysis and ranking of each area in terms of having development blessings. This study aims to Stratification of Kerman Province ...

متن کامل

Revision of the genus Crataegus in the East and Northeast of Iran

Morphological, numerical and chemical taxonomy of the genus Crataegus was studied in the East and northeast of Iran. More than 80 fresh specimens as well as 50 herbarium specimens from different localities were examind. Using vegetative and reproductive morphological characters an identification key is provided for all taxa. For numerical taxonomy 85 morphological characters were measured and t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007